Skip to content

Fix: count all tool tokens in budget including deferred tools#4990

Closed
bhavyaus wants to merge 1 commit intomainfrom
dev/bhavyau/fix-summarization-empty-prompt
Closed

Fix: count all tool tokens in budget including deferred tools#4990
bhavyaus wants to merge 1 commit intomainfrom
dev/bhavyau/fix-summarization-empty-prompt

Conversation

@bhavyaus
Copy link
Copy Markdown
Contributor

@bhavyaus bhavyaus commented Apr 6, 2026

No description provided.

…ools

Deferred tools (defer_loading: true) still count against the API context
window. The 3/30 change (#4834) excluded them from toolTokens, causing
the message budget to be ~31K tokens too generous and leading to
context_length_exceeded errors followed by summarization failures
("No messages provided").

- Count all tools in agentIntent budget calculation
- Reserve tool token budget in summarization prompt rendering
- Add modelMaxPromptTokens to summarization telemetry
- Add priority to summarization UserMessage

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 6, 2026 04:17
@bhavyaus bhavyaus closed this Apr 6, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates agent prompt budgeting to count all tool schema tokens (including deferred tools) against the model context window, and adjusts conversation summarization so its prompt rendering reserves token budget for tools. It also extends summarization telemetry and adds a unit test capturing a zero-messages rendering edge case.

Changes:

  • Revert tool token counting in AgentIntentInvocation to include deferred tools when computing the message budget.
  • Reserve tool token budget when rendering the summarization prompt in Full mode, and add modelMaxPromptTokens to summarization telemetry.
  • Add a unit test reproducing the “No messages provided” failure mode via an empty rendered prompt.
Show a summary per file
File Description
src/extension/prompts/node/agent/test/summarization.spec.tsx Adds a repro test where summarization prompt rendering produces zero messages under an extremely small token budget.
src/extension/prompts/node/agent/summarizedConversationHistory.tsx Reserves message budget for tools in Full summarization mode; tweaks message priority; adds modelMaxPromptTokens to telemetry.
src/extension/intents/node/agentIntent.ts Counts tool tokens across all available tools (no deferral filtering) and removes tool-deferral plumbing from the invocation.

Copilot's findings

Comments suppressed due to low confidence (1)

src/extension/prompts/node/agent/summarizedConversationHistory.tsx:689

  • After rendering the summarization prompt, summarizationPrompt can legitimately be empty (0 messages) when the token budget is too small (see the new repro test). The current code proceeds to makeChatRequest2 with messages=[], which will fail validation (“No messages provided”) and may prevent a clean fallback path while also producing noisy telemetry. Add an explicit guard after render (e.g. if summarizationPrompt.length===0, throw a BudgetExceededError or a dedicated error) to skip the request and force the intended fallback/handling.
		let summarizationPrompt: ChatMessage[];
		const associatedRequestId = this.props.promptContext.conversation?.getLatestTurn().id;
		try {
			summarizationPrompt = (await renderPromptElement(this.instantiationService, endpoint, ConversationHistorySummarizationPrompt, { ...propsInfo.props, simpleMode: mode === SummaryMode.Simple }, undefined, this.token)).messages;
			this.logInfo(`summarization prompt rendered in ${stopwatch.elapsed()}ms.`, mode);
		} catch (e) {
			const budgetExceeded = e instanceof BudgetExceededError;
			const outcome = budgetExceeded ? 'budget_exceeded' : 'renderError';
			this.logInfo(`Error rendering summarization prompt in mode: ${mode}. ${e.stack}`, mode);
			this.sendSummarizationTelemetry(outcome, '', this.props.endpoint.model, mode, stopwatch.elapsed(), undefined);
  • Files reviewed: 3/3 changed files
  • Comments generated: 2

@@ -379,7 +378,6 @@ export class AgentIntentInvocation extends EditCodeIntentInvocation implements I
@IExperimentationService private readonly expService: IExperimentationService,
@IAutomodeService private readonly automodeService: IAutomodeService,
@IOTelService override readonly otelService: IOTelService,
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AgentIntentInvocation’s constructor no longer accepts IToolDeferralService, but several subclasses (e.g. AskAgentIntentInvocation/EditCode2IntentInvocation/NotebookEditorIntentInvocation) still inject toolDeferralService and pass it to super(...). This will fail TypeScript compilation due to an argument count mismatch. Update those subclasses to remove the extra DI parameter + super argument (and remove now-unused imports) to keep constructors consistent.

Suggested change
@IOTelService override readonly otelService: IOTelService,
@IOTelService override readonly otelService: IOTelService,
_toolDeferralService?: unknown,

Copilot uses AI. Check for mistakes.
Comment on lines +668 to +678
// In Full mode, tools are sent alongside the summarization prompt with
// tool_choice: 'none'. Reserve budget for them so the rendered messages
// plus tools don't exceed the model's context window.
const tools = this.props.tools;
const toolTokens = mode === SummaryMode.Full && tools?.length
? await this.props.endpoint.acquireTokenizer().countToolTokens(tools)
: 0;
const endpoint = toolTokens > 0
? this.props.endpoint.cloneWithTokenOverride(
Math.max(1, Math.floor((this.props.endpoint.modelMaxPromptTokens - toolTokens) * 0.9)))
: this.props.endpoint;
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modelMaxPromptTokens telemetry is documented as “the … budget used for the summarization prompt rendering”, but the value sent is this.props.endpoint.modelMaxPromptTokens (the pre-reservation budget). Since getSummary may clone the endpoint with a reduced token budget after reserving tool tokens, telemetry will be misleading. Consider reporting the effective budget actually used for rendering (e.g. the cloned endpoint’s modelMaxPromptTokens / computed message budget), or report both original and effective budgets.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants